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ABSTRACT 

F\inctional  inference  recommends  data  analysis  of  a  sample  of  n  obser¬ 
vations  by  functional  and  graphical  representations  of  its  probability  models 
using  various  functions  on  0  <  u  <  1,  including  the  quantile  function.  This 
paper  discusses:  change  PP  plots  and  a  continuous  version  of  the  sample 
quantile  function  which  use  the  mid-distinct  values  probability  integral  trans¬ 
form;  comparison  density  functions;  comparison  interpretation  of  probability 
integral  transform;  maximum  spadngs  method  of  one  sample  parameter  es¬ 
timation. 

1.  My  15th  Anniversary  of  Texas  A&M  and  Functional  Statistical 
Inference 

As  the  Department  of  Statistics  at  Texas  A&M  University  celebrates  its 
30th  anniversary  in  1992,  each  of  us  who  are  part  of  the  department  may 
want  to  celebrate  our  personal  anniversaries  marking  the  length  and  depth 
of  our  association.  In  1992  I  am  completing  15  years  of  happy  and  deep 
association  (since  1978)  with  Texas  A&M.  I  woiild  like  to  thank  my  colleagues 
for  providing  an  enjoyable  and  stimulating  environment,  and  proving  that 
there  is  great  life  in  College  Station  (as  well  opportxmities  to  travel  to  help 
maintain  our  department’s  national  and  international  visibility).  We  can  all 
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take  pride  in  1992  in  the  fact  that  the  Department  of  Statistics  at  Texas 
A&M  has  a  reputation  as  one  of  the  outstanding  and  very  active  statistics 
programs. 

In  1978  my  research  emphasis  expanded  from  time  series  analysis  to 
functional  statistical  inference.  I  feel  that  time  series  analysis  is  impor¬ 
tant  not  only  to  provide  analysis  of  time  series  data,  but  also  to  provide 
a  backgroimd  suitable  for  new  approaches  to  mainstream  statistical  prob¬ 
lems.  What  I  call  functional  inference  recommends  data  analysis  of  a  sample 
of  n  observations  by  functional  (and  therefore  graphical)  representations  of 
its  statistical  properties  (probability  models),  using  various  ftmctions  on  the 
unit  interval  0  <  u  <  1.  An  important  function  is  the  quantile  ftmction  Q(u), 
0  <  u  <  1,  or  inverse  distribution  function.  Functional  inference  (introduced 
in  Parzen  (1979))  can  be  regarded  as  applying  time  series  theoretical  and 
function  smoothing  ideas  to  classical  statistics. 

What  will  be  the  benefits  of  functional  statistical  inference  to  applied 
statisticians?  I  believe  that  they  will  include  (1)  unification  of  statistical 
methods  for  discrete  and  continuous  random  variables,  (2)  change  analysis, 
(3)  information  theory  approaches  to  statistical  inference  (see  Parzen  (1989), 
(1991),  (1991),  (1992)). 

Unification  may  have  the  most  difficulty  arousing  interest  from  applied 
statisticians;  its  philosophy  is  that  statistical  problems  should  be  solved  in 
several  ways  (when  I  ask  graduate  students  what  are  several  ways  to  solve  a 
problem  in  a  textbook  they  usually  tell  me  there  is  only  one  way!).  Change 
analysis  (on  which  my  research  interests  have  bloomed  since  December  1990) 
is  an  extension  of  changepoint  analysis  which  has  as  initial  goal  to  determine 
if  a  probability  model  fitted  to  a  whole  sample  Fi , . . . ,  l^n  fits  all  subsam¬ 
ples  Fi, . . .  ,Ym  for  all  m  <  n.  Information  theory  is  import2uit  to  statistics 
because  it  provides  measures  of  divergence  between  two  probability  distribu- 
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tions. 


The  research  of  Eubank,  LaRiccia,  and  Hart  (1992)  can  be  considered 
to  be  fundamental  research  on  functional  statistical  inference.  The  practice 
of  statistics  would  be  enhanced  by  applying  their  deep  insights  about  the 
relations  between  goodness  of  fit  tests  and  nonparametric  regression. 


2.  One  Sample  Probability  Model  Fitting 

A  basic  problem  of  statistics  is  fitting  probability  models  to  a  sample 
li , . . . ,  y„  of  a  continuous  random  variable  Y  with  true  distribution  function 


Fy(y)  =  Prob[y  <y], 

probability  density  function  /y(y)  =  -FyCy),  quantile  fimction 

Oy(u)  =  Ff ‘(u). 


The  parametric  approach  to  modeling  a  rsmdom  sample  assumes  a  parametric 
probability  model  /«(y)  indexed  by  a  vector  parameter  6  with  k  components 

Classiced  statistics  assumes  suitable  regularity  conditions  on  the  para¬ 
metric  family  of  probability  densities  in  order  to  assure  desirable  properties 
of  parameter  estimators  formed  by  maximum  likelihood  estimators  6"'.  These 
are  defined  to  maximize  the  log  likelihood 

„  DTIC  riirpincTED  { 

m  =  '£iogMy,), 

t=l 

and  are  usually  computed  as  solutions  of  the  estimating  equations 
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where  Sfij(Y;0)  is  the  partial  derivative  of  log/#(y)  with  respect  to  the 
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Goodness  of  fit  of  the  parametric  model  to  the  data  is  tested  by  forming 
the  transformed  data 

Wt  =  FriYt) 

and  testing  whether  the  discrete  distribution  function,  denoted  0  < 

u;  <  1,  of  Wi, . . . ,  Wn  is  significantly  different  from  the  distribution  function 
of  U,  a  Uniform[0,l]  random  variable.  Functional  inference  converts  this 
question  into  a  problem  about  the  detection  of  signal  in  noise  in  data  that 
is  a  process  on  0  <  u;  <  1.  Under  the  null  hypothesis  that  the  parametric 
model  fits,  the  limit  distribution  of  n’^(F^\w)  —  u>)  is  a  Brownian  Bridge 
B{w),  0  <  w  <  1,  modified  for  the  effect  of  parameter  estimation  (Shorack 
and  Wellner  (1986)). 

Goal  One  of  this  paper  is  to  define  continuous  versions  of  sample  dis¬ 
tribution  functions  that  help  provide  tests  of  goodness  of  fit  which  provide 
non-parametric  estimators  of  /y  when  a  parametric  model  does  not  fit. 

3.  Comparison  and  information  divergence  of  probability  distribu* 
tions 

Goal  Two  of  this  paper  is  to  raise  statisticians’  consciousness  about 
the  concept  of  comparison  density  function  d(u;F,  G),  0  <  u  <  1,  of  two 
distributions  F  and  G  (see  Parzen  (1992)). 

For  F  and  G  continuous,  we  define  the  comparison  distribution 

D{u]F,G)  =  G(F"‘),0  <  u  <  1, 

and  comparison  density 

d(u  -.F,G)=g{F-\u)ynF-'(u)),0  <  u  <  1. 

The  graph  of  D(u  :  F,  G)  is  called  a  FF-plot  because  it  is  a  plot  of  (F(y), 
G(y))  which  compares  the  P  values  of  an  observation  y  under  the  two  dis¬ 
tributions. 
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For  F  and  G  discrete  with  respective  probability  mass  functions  pp  and 
Pa  we  first  define  the  comparison  density 

d(ji :  F,Cj)  =pg(F“^(u))/pf(F“^(u)),0  <  u  <  1. 

The  comparison  distribution  is  then  defined  as  the  integral 

D{n  :  F,  (3?)  =  f  d(t;  F,  G)dt,  0  <  u  <  1. 

Jo 

An  equivalent  way  to  describe  D(u;  F,  G)  in  the  discrete  case  is  D(u;  F,  G) 
G(F~^(u))  at  F-exact  u  satisfying  F(F~^(u))  =  u  and  D(u;  F,  G)  is  defined 
at  other  values  of  u  by  linear  interpolation  between  its  values  at  F-exact  val¬ 
ues  of  u.  When  a  PP-plot  i?(tt;F,G)  is  determined  by  linear  interpolation 
we  call  the  values  determining  the  plot  the  PP-plot  values;  thus  the  PP-plot 
values  are  G(F“*(tt))  for  all  values  of  u  that  are  F-exact. 

Goal  Three  r  f  this  paper  is  to  recommend  Change  PP  plots  which  plot 
(F(y),  G(y)  -  F(y))  or  equivalently  (u,  JD(u;  F,  G)  -  u). 

Implicit  in  our  definitions  are  assiunptions  that  guarantee  that  D(0;  F,  G) 
0,  D(l;  F,  G)  =  1.  Therefore  <i(u;F,G)  is  a  density,  a  non-negative  fimction 
integrating  to  1. 

Goal  Four  of  this  paper  is  to  remind  statisticians  that  very  useful  mea¬ 
sures  of  divergence  of  D(u)  from  u  are  Renyi  information  measures  (Renyi 
(1961))  of  the  divergence  of  d(u)  from  1.  They  provide  “entropy  detectors” 
to  be  used  in  addition  to  “non-linear  detectors”  and  “linear  detectors”  which 
use  norms  of  D(u)  —  u. 

For  a  density  d(u),  0  <  u  <  1,  Renyi  information  of  index  A,  is  defined 
for  A  not  equal  to  0  or  -1  by 

IRxid)  =  (2/A(l  -I-  A)) log  d(tx)i+^dti 

Jo 
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For  A  equal  to  0  or  -1,  define 


IRo{d)  =  2  f  (d(u)  log d(u))du 
Jo 

IR.i(d)  =  -2f  log d(u))du 
Jo 

Hellinger  information  corresponds  to  A  =  —.5; 

IR-.s(d)  = —Slog  f  d(uy^du. 

Jo 

A  very  useful  identity  is 

IRx{d(,w,  F,  G))  =  G,  F)) 


4.  Comparison  interpretation  of  probability  integral  transform 

An  important  application  of  comparison  concepts  is  to  interpret  explicit 
formtilas  for  the  true  distribution  and  true  quantile  function  of  the  probability 
integral  transform  W  =  F0{Y)  assuming  Y  is  continuous  and  the  parametric 
model  is  continuous.  One  can  show  that 

Qw(u)  =  Fo(Qy(u))  =  D{u-Fy-,F$), 

Fwiyo)  =  Fy{Q${w))  =  D{w;F0,Fy) 

Goal  Five  of  this  paper  is  to  recommend  that  the  divergence  (compari¬ 
son)  between  two  distribution  measures  Fy  and  Fj  be  measured  by  the  diver¬ 
gence  from  Do(u)  =  u  of  the  comparison  distribution  functions  D{u;  Fy,  F0) 
or  D(u;  F0,  Fy). 

To  illustrate  the  different  roles  played  by  the  two  possible  comparison 
distribution  functions  we  note  that  (1)  for  estimation  of  the  parameter  one 
chooses  ^  as  the  value  of  6  making  the  quantile  function  D{u;  Fy,F0)  close 
to  u,  while  (2)  for  goodness  of  fit,  one  tests  if  the  distribution  function 
D(u;F0  ,Fy)  is  close  to  u. 
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5.  Sample  quantile  functions 

Goal  Six  of  this  paper  is  emphasize  to  applied  statisticians  our  opinion 
that  the  first  step  in  data  analysis  should  be  to  form  the  sample  quantile 
function  0  <  u  <  1,  which  is  the  inverse  of  the  sample  distribution 

function  — oo  <  y  <  oo.  To  compute  it  one  determines  uj,  Vj  for 

j  =  where  (1)  the  distinct  values  in  the  sample  are  denoted  vj, 

j  =  1, . . . ,  c,  and  (2)  the  ctunulative  relative  frequencies  are  denoted 

Uj  =  =  fraction  of  sample  <  vj. 

Note  Ue  =  1;  define  uq  =  0.  If  all  values  in  the  sample  are  distinct,  c=n  and 
the  distinct  values  are  the  order  statistics  y(l;  n)  <  . . .  <  y(n;  n). 

The  sample  quantile  function  the  inverse  of  the  sample  distribution 

function  can  be  calculated  by 

=  Vj,  Uj-i  <U<  Uj, 

or  equivalently  it  is  piecewise  constant  left  continuous  satisfying 

Q^'^Huj)  =  Vj,  j 

The  sample  median  and  quartiles  are  defined  to  be  the  values  at  u  =  .5,  .25, 
.75of  g(’*)(u). 

A  nonparametric  measure  of  location  is  the  sample  mediein.  A  nonpaxa- 
metric  measure  of  scale  is  the  quartile  deviation 

=  2/gi^<"^ 

defined  as  twice  the  interquartile  range 

=  g<">(.75)  -  g(")(.25). 

An  important  chsuracteristic  of  a  distribution  is  its  behavior  at  the  tails  or 
ends  of  the  distribution;  we  like  to  joke  that  “in  statistics  the  ends  do  justify 
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the  means”  (to  adaptively  efficiently  estimate  location  parameters  one  must 
estimate  tail  shape  parameters).  Our  experience  is  that  tail  behavior  can  be 
judged  (in  a  quick  and  dirty  “back  of  an  envelope”  way)  from  the  values  near 
0  and  1  of  the  identification  quantile  function  (Parzen  (1983)) 

Q/<">(u)  =  ((?(”)(«)  -  Q<">(.5))/gi>(’*\ 

Intuitively,  the  identification  quantile  function  is  normalized  to  have  at  u  =.5 
value  0  and  slope  approximately  1. 

Goal  Seven  of  this  paper  is  to  remind  statisticians  that  there  is  an  ex¬ 
tensive  literature  on  the  important  question  of  whether  one  should  use  a 
nonparametrically  smoothed  sample  quantile  function  Q“^”^(u)  rather  than 
the  raw  sample  quantile  function  at  the  initial  stage  of  analysis. 

A  comprehensive  survey  and  exhaustive  analysis  of  properties  of  smoothed 
sample  quantile  functions  is  given  in  the  outstanding  Ph.D.  thesis  of  Cheng 
Cheng  (1993).  In  this  paper  I  discuss  my  proposal  for  a  quick  and  dirty 
smoothing  provided  by  a  continuous  version  of  the  sample  quantile  function. 

6.  Continuous  versions  of  sample  quantile  and  distribution  func¬ 
tions,  and  Change  PP  Plots 

The  sample  distribution  fxmction  of  data  is  discrete.  Goal  Eight  of 
this  paper  is  to  propose  that  to  estimate  a  continuous  distribution  fimction 
we  first  form  a  continuotis  version  as  follows.  Define  mid- values  v’j, 
j  =  l,...,c-  1,  by 

v]  =  .5(v>  -I-  Uj+i). 

We  do  not  propose  a  universal  definition  of  Vq  and  u|.  Initially  we  define 
Vo  =  vi,  w*  =  Vc. 

Define  F®^"^  and  to  be  piecewise  linear  between  its  values  (for 
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j  =  0,...,c) 

ire('‘)(t,p  =  uj 

Goal  Nine  of  this  paper  is  to  note  that  our  proposed  continuous  version 
may  be  regarded  as  related  to  another  important  modification  of  a  discrete 
distribution  (which  we  call  the  mid-distribution)  that  is  being  increasingly 
recognized  as  the  way  to  express  P-levels  of  significance  tests  (see  Routledge 
(1992),  Upton  (1992)).  The  mid-distribution  function  of  the  sample  is  defined 
by 

Fmid(n)^y)  ^  jp(«)(y)  _  .5p(’*>(y) 

where  is  the  fraction  of  the  sample  equal  to  y.  One  expects  that 

approximately 

=  (uj  +  Uj-,)/2, 

Goal  Ten  of  this  paper  is  to  propose  that  the  problem  of  goodness  of 
fit  and  parameter  estimation  of  the  parametric  model  Ff  be  treated  as  one 
of  comparing  with  the  Uniform[0,l]  distribution  Do{u)  =  u  the  continuous 
comparison  distribution  functions  defined  in  terms  of  the  PP-plot  values  uj 
and 

which  we  call  the  **dis1inci  mid-value  probability  transform”  since  for  j  = 

1,  —  1 

Wj(e)  =  Ff((vj  -h  vj+i)/2). 

Define  the  quantile-type  PP-plot  D®(u;  F^”\  Ft)  as  piecewise  linear  connect¬ 
ing 

(0, 0),  (ui ,  u;i(^)), . . . ,  (uc-i ,  Wc-i(ff)),  (1,1). 
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Define  the  distribution-type  PP  plot  D^{u;  F^"^)  as  piecewise  linear  con¬ 

necting 

(0, 0),  (u>i(d),  ui), . . . ,  (we~i(d),  Uc-l),  (1, 1). 

In  practice  we  recommend  plotting  Change  PP  plots  of  (uj,Wj(ff)  —  uj) 
and  comparison  densities  F#)  and  d‘^(u;F(>,F^"^). 

7.  Maximum  Spacings  method  of  one  sample  parameter  estimation 

Regular  maximum  likelihood  estimators  d"  are  parameter  values  mini¬ 
mizing  the  negative  of  the  average  log  likelihood 

-  L(«)  =  (l/n)  -  log /,(r0:  n)) 

*=1 

i=i 

A  maximim  spacings  estimator,  also  denoted  6“,  minimizes 

-2  -  Oj-.)  log(F,«?«<“>(»^))  -  -  Uj-.)) 

i=l 

Maximum  spacings  estimators  have  been  discussed  by  Cheng  and  Res 
(1987),  Ranneby  (1984),  Cheng  and  Amin  (1983),  Titterington  (1985);  they 
cotild  be  called  Maximmn  Grouped  Likelihood  estimators.  Meudmiun  spac¬ 
ings  estimator  can  be  shown  to  provide  credible  estimators  in  non-regular 
cases  (where  likelihood  is  xmboimded  and  thus  maximmn  likelihood  does  not 
provide  a  satisfactory  estimator)  and  to  provide  efficient  estimators  in  regular 
cases  (they  have  the  same  properties  as  maximum  likelihood  estimators). 

Goal  Eleven  of  this  paper  is  to  note  that  (1)  meodmum  spacings  esti¬ 
mators  can  be  represented  in  terms  of  comparison  density  functions  whose 
neg-entropy  is  minimized  to  find  parameter  estimators: 

2  /\-log<F(u;F<’‘>,F,))du, 

Jo 
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and  (2)  adapting  Beran  (1977)  one  can  obtain  robust  parameter  estimators 
by  combining  maximum  spacings  estimations  with  minimum  information  es¬ 
timation  criteria. 

We  propose  for  investigation  minimum  information  estimators  (more 
precisely,  Tninimnm  Renyi  information  of  index  A  estimators),  denoted 
defined  to  minimize 

Minimum  information  estimators  satisfy  the  estimating  equations 

\d%u;F^^\Fe)y-^^S0,(Q‘^'^\u),e)du  =  0. 

Regular  maximum  likelihood  estimators  correspond  to  A  =  —1.  Mini¬ 
mum  information  estimators  of  index  A  are  of  interest  because  they  provide 
robust  estimators  in  the  presence  in  the  data  of  values  not  fitting  the  assumed 
parametric  probability  model  (see  Beran  (1977)).  To  test  if  they  should  be 
computed  in  preference  to  regular  maramum  likelihood  estimators  one  could 
test  if  the  latter  satisfy  the  estimating  equations  of  the  former.  Research  on 
these  ideas  is  continuing. 

8.  Example  of  Change  PP  Analysis 

The  introductory  statistics  textbook  by  Friedman  et  al  (1978)  discusses 
a  data  set  consisting  of  100  measturements  made  at  the  National  Bureau  of 
Standards  on  the  weight  of  NB  10.  It  is  very  interesting  because  it  appears 
to  follow  a  normal  distribution  with  outliers.  One  can  obtain  this  conclu¬ 
sion  by  an  exploratory  analysis  (described  below)  or  by  robust  estimation  of 
parameters  of  a  normal  distribution  using  minimmn  information  estimators. 

Each  measurement  in  the  sample  is  the  niunber  of  micrograms  below  ten 
grams.  The  sample  staoidard  deviaton  is  approximately  6  micrograms  (the 
maximmn  likelihood  estimator).  But  a  normal  distribution  with  parameters 
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equal  to  the  sample  mean  emd  standard  deviation  does  not  pass  tests  of 
goodness  of  fit  to  the  data.  A  trimmed  sample  (trimmed  to  omit  the  smallest 
and  largest  values)  has  standard  deviation  of  approximately  4  microgram  (the 
robust  estimator)  and  is  fit  by  a  normal  distribution. 

To  compare  whole  sample  with  a  probability  model,  we  first  compute 
sample  quantile  function  Q^”'^  of  data  normalized  by  subtracting  sample 
mean  and  dividing  by  sample  standard  deviation.  Figure  1  compares 
with  the  standard  normal  quantile  function;  we  intuitively  perceive  that 
their  slopes  at  tx  =  .5  differ,  indicating  that  the  true  scale  paramaeter  of  the 
data  is  not  well  estimated  by  the  sample  standard  deviation.  Figure  2  com¬ 
pares  to  the  standard  normal  the  sample  quantile  fimction  of  the  normalized 
trimmed  sample;  we  perceive  a  fit. 

Next  we  compute  (what  we  have  denoted  by  Wj)  the  mid  distinct  values 
of  the  normalized  samples  transformed  by  the  standard  normal  distribution. 
We  compare  wj  to  the  sample  cumulative  frequencies  Uj.  A  PP  plot  graphs 
the  linear  interpolation  of  (uj,Wj).  A  Change  PP  plot  graphs  the  linear 
interpolation  of  (uj,n'*(u;j  —  Uj)).  Under  the  null  hypothesis  of  goodness 
fit,  the  Change  PP  plot  should  be  a  sample  path  of  a  Brownian  Bridge 
process,  modified  by  the  effect  of  parameter  estimation.  The  asymptotic 
95%  significance  level  (foimd  by  simulation)  is  .97. 

The  Change  PP  plot  of  the  whole  sample  in  Figme  3  indicates  lack 
of  fit  because  of  its  maximiun  (which  is  1.12)  and  its  shape  (which  can  be 
interpreted  by  an  experienced  analyst  as  a  canonical  shape  indicating  that  the 
probability  integral  transformed  data  has  a  probability  density  whose  graph 
looks  like  a  bowl,  implying  outliers  in  the  original  data).  This  conclusion 
is  reached  with  a  minimum  of  computation;  it  would  also  be  reached  by  a 
computer  intensive  density  estimation  analysis  of  the  PP  plot. 

The  Change  PP  plot  of  the  trimmed  sample  in  Figure  4  indicates  fit  (of 
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the  trimmed  sample  by  the  normal  with  parameters  equal  to  the  maximum 
likelihood  estimators  from  the  trimmed  sample)  because  of  its  maximum 
and  its  shape  (which  can  be  interpreted  as  a  canonical  shape  whose  deriva¬ 
tive  is  a  constant  function,  indicating  the  transformed  data  has  a  uniform 
distribution). 
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